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ons  of  VCT  and  the  experience  of  practitioners  remains  to  be  crossed.  Our  extend 
research  into  reductions  of  the  sample  size  estimates  produced  by  VCT  and  into  otv, 

imP™ts  of  the  VCT  arguments  have  yet  to  yield  results  of  prac^sig^canc" 
that  can  serve  as  advice  to  neural  network  desip*np>-c  Tn  tv*  r  ,  ,  ® 
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NSF  «;nnneneSv0rei,CaStlnS  b€€n  “°re  successful.  The  continuation  of  this  work  under 
XL  "  “  IeSUlted  ta  a  fo-as ter  for  tewTS taS 


summary 

As  explained  previously,  at  the  outset  of  research  tre  revised  our  program  objectives 
to  concentrate  on  probabilistic  studies  of:  program  onjectlves 

"  mificiaI^l  n«wrk  (ANN)  to  generalise,  the  core  of  the  benchmarking 
p  oblem  being  approached  through  Vapnik-Chervonenkis  theory  (VCT); 

tXld°LtsT  °f  ANNS  USiDS  discrete-valu4d  oonhnear  nodes  such  as  the  classical  linear 

the  ability  of  neural  networks  to  implement  nonlinear  forecasters. 

Secondary  objectives  were  to  develop  a  graduate  program  in  electrical  engineering  at  Cor 
nell  m  the  area  of  ANNs.  Our  ANN  research  group  expanded  to  incluTSlder 

couL  ANN  at'oom’cuTs  w “d  “°  Ph'D'  suidents'  We  continue  t0  offer  «* 
surveys  standard  probate “ 

completion  7c^ £££££.'***»  «  «"  ™.  -d 

previous  interim  technical  report.  Our  studyof  VC  theory  attop^d  toZuderT^dZ 
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inadequacy  as  a  guide  to  practice;  VC  theory  predicts  a  need  for  training  sample  sizes  that 
are  orders  of  magnitude  greater  than  those  used  happily  by  practitioners.  We  made  several 
attempts,  reported  m  Fine  [1991],  to  lower  the  VC  upper  bound  estimates  of  training 
sample  size  n  required  to  select  a  net  r?‘  whose  error  probability  performance  £*  is  within 
e  of  the  lowest  error  probability  €  achievable  by  nets  in  a  given  family /architecture  U 
having  a  VC  capacity/dimension  V;  VC  theory  estimates  n  =  0($  In  J);  Even  a  small  net 
architecture  can  have  V  >100  and  e  <  .1  is  far  from  stringent.  VC  theory  then  suggests 
that  successful  training  will  require  n  -  O(105)  a  far  larger  training  set  than  is  used  in  all 
but  character  recognition  programs. 

i  j  *f^uref  to  J^uce  the  VC  upper  bounds  caused  us  to  reconsider  this  problem  and 
led  us  to  show  that  for  any  VC  capacity  V  there  exist  architectures  of  this  capacity 

such  that  for  N\  we  need  n  -  0(£)  to  select  a  net  77“  whose  performance  is  within  e  of  the 
per  ormance  of  the  best  net  770  in  N\  while  for  M  we  require  a  sample  size  n  =  O(-)  This 
Utter  result  is  within  a  factor  of J  log  J  of  the  VC  bound.  More  recent  research'  proved 
these  estimates  by  studying  the  baseline  case  of  a  single  linear-threshold-unit  (perceptron) 

operating  on  normally  distributed  data.  We  found  a  necessary  relationship  between  these 
parameters  of 

v 

U  ~  70*2 ' 

Hence,  any  universal  VC-type  bound  must  be  at  least  as  great  as  this  and  it  must  be  0(4). 

?  ^  fpojted  that  the  mathematician  M.  Talagrand  had  obtained 
the  best  possible  VC  bounds  and  we  were  finally  able  to  obtain  a  preprint  of  his  paper  this 
pas.  winter..  Talagrand  obtains  the  best  possible  exponents  and  his  results  prove  that  the 
log  7  factor  is  not  needed.  However,  his  constants  are  undetermined  and  it  is  impossible  to 
obtain  quantitative  conclusions  from  his  results  without  a  large  amount  of  difficult  work. 

.  t  a®rand  reco°n^zes  anc*  Uaves  the  necessary  calculations  to  “those  with  a  taste  for 
,  It  is  clear  from  our  result  quoted  above  for  the  perceptron,  that  the  issue  has  become 
one  of  the  constants’.  Our  lower  bound  and  his  upper  bound  are  both  0(4).  Our  research 

°v  c°n^nues’  though  we  have  yet  to  discern  a  general  argument  for  reducing 

tne  VC  bounds.  Indeed,  we  have  come  to  suspect,  and  are  attempting  to  verify,  that  the 
C  bounds  are  roughly  correct  for  the  problem  they  address  of  uniform  approximation, 
e  gap  between  theory  and  practice  may  arise  from  the  difference  in  sample  sizes  required 
to  guarantee  that  all  networks  will  have  training  set  performance  within  e  of  their  of  then- 
true  statistical  performance  and  the  sample  size  required  to  guarantee  this  for  just  the 
network  selected  by  the  training  algorithm.  The  difficulty  is  in  assessing  this  quantity. 

Our  second  direction  of  research  is  in  the  application  of  neural  networks  to  prediction 
and  to  classification.  We  have  studied  the  use  of  discrete-  valued  nodes  (binary  or  ternary 
valued)  as  pattern  classifiers  (Fine  [1991]).  We  are  also  engaged  in  an  application  of  neural 
networks  having  the  usual  logistic  units,  to  forecasting /demand  for  electric  power  to  be 
supplied  by  a  large  electric  utility  over  the  next  24  hours  (Yuan  and  Fine  [1992,1993]). 
Such  forecasts  are  of  economic  importance  and  impact  unit  commitment  and  possible 
purchases  from,  and  sales  of  power  to,  other  utilities.  Experience  with  nets  composed 
of  only  discrete- valued  nodes  led  us  to  an  architecture  combining  a  linear  predictor  with’”  ’ 
a  small  net  of  ternary-valued  nodes  to  nonlinearly  forecast  the  residuals.  However,  we 
encountered  sparsity  problems  with  our  training  method  and  abandoned  this  direction  in 
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the  logistic  nodes  are  actually  operating  over  their  linear  region  and  can  be  combed  tato 

such  desi«ra  y«ld  "liable  nonlinear  forecasters  for  the  number 
ghts  to  be  trained  was  usually  comparable  to  the  number  of  training  samples. 

tVi  °  1!?;St  007  de51®11  of  sma!1  networks,  we  developed  a  new  feature  selection  criterion 

tew^one’^Vlh/^ilout4  J?ge  nUmbvr  (ib°Ut  efty>  of  Potenti*l  network  inputs  to 

by  eteric  u«litteenCtoark  0UI  neUral  net  desipl  **aln!t  the  ™rltfaS  methods  employed 
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